Paged Vector

date: 2023-10-08
author: Peterino

A wild data structure appeared! - Paged Vectors

I haven't found an official name for them yet, but I have seen them in quite a few places. I'm going to call them "Paged Vectors". But the idea is simple. Let me walk you through a common issue with "normal" vectors (or dynamic arrays or arraylists) in programming.

We all know a normal vector is just an array of contiguous data. When you write to it it grows. but what happens when you reach the end of the vector?

What happens in 90% of dynamic array implementations goes a little like this:

  1. New buffer is allocated (usually 2x larger than the previous buffer).

  1. data is copied from the old buffer to the new one.

  1. old data is then invalidated and destroyed

so when we hit this resize event. Two extremely expensive operations are conducted. Both the allocation and the copy are potentially incredibly expensive for very large collections.

Not to mention during the interim in which both the old and new buffers are alive you're sitting on a lot of memory for no reason.

Layout of a paged vector

I first came across the idea after seeing how Unreal Engine internally stores visual log data. They didn't have it as a dedicated standalone data structure, rather it was embedded in some internal logging structure.

But it was a cool enough idea for me to spend an afternoon coding it up and documenting it.

The paged vector looks a little bit different than our vector. Basically its a "Vector of fixed sized vectors".

Rather than resizing the whole buffer that has been allocated, all we do is actually allocate a new fixed size buffer.

This has several interesting positives:

  1. Resizing is no longer an expensive allocate-copy-free, it's just an allocate.
  2. Because the buffers are fixed size, indexing is a constant-time modulo operation.
  3. Because most elements are contiguous, this is actually cache friendly for linear iteration.
  4. Memory usage growth is linear and can we can predict if we're going to oom in the future.
  5. Pages can be sized according to whatever is natural to the operating system to reduce fragmentation.

And there's one more tiny benefit:

POINTERS ARE NEVER INVALIDATED

If you're interested I got battle and unit tested implementation of it up at github:

https://github.com/peterino2/p2-algorithms/blob/integration/structures/paged-vector.zig

(It's zig, but you could convert this code to C, C++, or any other language very easily).